Unsupervised Models of Text Structure

نویسنده

  • Annie Louis
چکیده

Models of text structure are necessary for applications that generate text. These models provide information about what content fits together and how to organize the content as coherent text. In some domains such as newswire, biographies and stories for children, texts tend to have similar content and structure. Such regularities have allowed the development of unsupervised methods to learn text structure using humanwritten examples from such domains. We survey some of the recently proposed approaches in this area and review their use in different text generation tasks. First, we consider approaches with a focus on computational semantics. We review work aiming to discover patterns of related events from news articles and children’s stories. We consider one application of such knowledge–an automatic story-telling system. Next, we move to methods which focus on coherence and organization. We describe these in the context of two generation tasks–sentence ordering and the creation of long articles. In view of the sentence ordering problem, we survey approaches targeted at learning properties of coherent transitions between adjacent sentences in texts. Then, we consider the generation of long biographical descriptions. Here we survey recent work on automatically generating such articles using higher level patterns in text structure such as subtopics and their organization. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-11-16. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/959 Unsupervised Models of Text Structure

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Unsupervised Learning of Field Structure Models for Information Extraction

The applicability of current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, small amounts of prior knowledge can be used to effectively learn models in a primarily unsupervised fashion. Many text information sources exhibit a latent field structure: such documents can be viewed as...

متن کامل

Exploring Temporal Patterns in Emergency Department Triage Notes with Topic Models

Topic modeling is an unsupervised machine-learning task of discovering topics, the underlying thematic structure in a text corpus. Dynamic topic models are capable of analysing the time evolution of topics. This paper explores the application of dynamic topic models on emergency department triage notes to identify particular types of disease or injury events, and to detect the temporal nature o...

متن کامل

Optimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network

Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...

متن کامل

Discovering Latent Structure in Task-Oriented Dialogues

A key challenge for computational conversation models is to discover latent structure in task-oriented dialogue, since it provides a basis for analysing, evaluating, and building conversational systems. We propose three new unsupervised models to discover latent structures in task-oriented dialogues. Our methods synthesize hidden Markov models (for underlying state) and topic models (to connect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010